How we’ve interacted with data dictates how we structure data mentally.
A value is defined by its:
Values are distinguished by similarities and differences in their multi-dimensional states, attributes, and contexts.
Values are distinguished by similarities and differences in their multi-dimensional states, attributes, and contexts.
Values are distinguished by similarities and differences in their multi-dimensional states, attributes, and contexts.
In R, the most commonly used types of values are:
Number values can be either:
# Create a vector of numeric values:
numericV <- c(3, 2, 1, 1)
numericV
Number values can be either:
# What type of object is this?
class(numericV)
str(numericV)
summary(numericV)
Number values can be either:
# Create a vector of numeric integer values:
numericInteger <- 1:5
numericInteger
# What type of object is this?
class(numericInteger)
str(numericInteger)
summary(numericInteger)
A character or “string” value is a symbol or set of symbols from a given library
# Create a vector of character values:
exampleCharacter <- c('three', 'two', 'one', 'one')
exampleCharacter
# What type of object is this?
class(exampleCharacter)
str(exampleCharacter)
summary(exampleCharacter)
A factor value includes the following information:
# Create a vector of factor values:
exampleFactor <- factor(c('three', 'two', 'one', 'one'))
exampleFactor
# What type of object is this?
class(exampleFactor)
str(exampleFactor)
summary(exampleFactor)
A factor value includes the following information:
# Set factor levels and labels:
factor(c('three', 'two', 'one', 'one'))
factor(
c('three', 'two', 'one', 'one'),
levels = c('one', 'two', 'three')
)
A factor value includes the following information:
# Set factor levels and labels:
factor(c('three', 'two', 'one', 'one'))
factor(
c('three', 'two', 'one', 'one'),
levels = c('one', 'two', 'three'),
labels = c('One', 'Two', 'Three')
)
R reserves the words TRUE and FALSE as logical constants. These constants are mapped to integer values:
# Observe the behavior of logical values:
FALSE
TRUE
as.numeric(FALSE)
as.numeric(TRUE)
FALSE + TRUE
FALSE + TRUE + TRUE
Logical values can be obtained by evaluating objects with logical operators. For example, the logical operator == tests whether a value is equal to another value.
# The "is equal to" logical operator:
3 == 3
3 == 4
3 == 2 + 1
3 == 3 + 1
(3 == 3) + (3 == 2 + 1)
In R, containers called objects structure collections of values. Different types of objects store values in different ways:
| Object dimensions | Homogeneous class | Heterogeneous class |
|---|---|---|
| 1-D | Atomic vector | List |
| 2-D | Matrix | Data frame |
For each object type, we’ll address:
An atomic vector is a one-dimensional collection of values. All values must be of the same class.
Each value in a vector has a position, denoted by “[x]”
| [1] | [2] | [3] | [4] |
|---|---|---|---|
| 1 | 1 | 2 | 3 |
An atomic vector is a one-dimensional collection of values. All values must be of the same class.
# A vector of numeric values:
numericVector <- c(1, 1, 2, 3)
numericVector
## [1] 1 1 2 3
summary(numericVector)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 1.00 1.50 1.75 2.25 3.00
An atomic vector is a one-dimensional collection of values. All values must be of the same class.
# All values in a vector must be of the same class:
numericVector
## [1] 1 1 2 3
messyVector <- c(1, 'one', 2, 3)
messyVector
## [1] "1" "one" "2" "3"
Each value in a vector has a position, denoted by “[x]”
| [1] | [2] | [3] | [4] |
|---|---|---|---|
| 1 | 1 | 2 | 3 |
# Use indexing to subset a vector:
numericVector
numericVector[3]
numericVector[3:4]
numericVector[c(1,3)]
Typical attributes we are interested in of vectors include:
# Attributes of the vector:
class(numericVector)
length(numericVector)
str(numericVector)
Attributes can be added to vectors.
# Adding attributes to a vector:
numericVector
names(numericVector)
names(numericVector) <- c('orange', 'pear', 'apple', 'apple')
Vectors can be indexed by their names attribute.
| [‘orange’] | [‘pear’] | [‘apple’] | [‘apple’] |
|---|---|---|---|
| 1 | 1 | 2 | 3 |
numericVector[2]
numericVector['pear']
numericVector[2] == numericVector['pear']
numericVector[c('orange', 'pear')]
A matrix is a two dimensional object – basically a vector that has been split into multiple columns. All values must be of the same class.
Values in a matrix have a row (x) and column (y) position, denoted by “[x, y]”
| [ ,1] | [ ,2] | |
|---|---|---|
| [1, ] | 1 | 2 |
| [2, ] | 1 | 3 |
A matrix is a two dimensional object – basically a vector that has been positioned as multiple columns. All values must be of the same class.
# Generate matrix:
m <- matrix(c(1, 1, 2, 3), ncol = 2)
m
## [,1] [,2]
## [1,] 1 2
## [2,] 1 3
A vector can be structured horizontally (row-wise) or vertically (column-wise) within a matrix:
# Compare matrices built row-wise and column-wise:
matrix(c(1, 1, 2, 3), ncol = 2, byrow = TRUE)
## [,1] [,2]
## [1,] 1 1
## [2,] 2 3
matrix(c(1, 1, 2, 3), ncol = 2, byrow = FALSE)
## [,1] [,2]
## [1,] 1 2
## [2,] 1 3
Because matrices must be homogeneous, all values are forced to be the same type.
# Matrix built with multiple types:
messyMatrix <- matrix(c(1, 'one', 2, 3), ncol = 2)
messyMatrix
## [,1] [,2]
## [1,] "1" "2"
## [2,] "one" "3"
Values in a matrix have a row (x) and column (y) position, denoted by “[x, y]”
| [ ,1] | [ ,2] | |
|---|---|---|
| [1, ] | 1 | 2 |
| [2, ] | 1 | 3 |
# Index by row (x) and column (y) position [x,y]:
m[1,1]
m[2,2]
m[1:2,2]
There are a number of attributes that can be observed for a given matrix:
# View matrix attributes:
class(m)
length(m)
dim(m)
str(m)
summary(m)
You may add a name attribute to rows and columns.
# Naming rows and columns:
colnames(m) <- c('a', 'b')
rownames(m) <- c('c', 'd')
attributes(m)
m
A list is a one dimensional object constructed by combining ANY objects with ANY dimensionality.
List position is denoted by [[x]].
[[1]]
| [1] | [2] | [3] | [4] |
|---|---|---|---|
| 1 | 1 | 2 | 3 |
[[2]]
| [ ,1] | [ ,2] | |
|---|---|---|
| [1, ] | 1 | 2 |
| [2, ] | 1 | 3 |
[[3]]
| [ ,1] | [ ,2] | |
|---|---|---|
| [1, ] | “1” | “2” |
| [2, ] | “one” | “3” |
A list is a one dimensional object constructed by combining ANY objects with ANY dimensionality.
# List of a numeric vector and matrices:
exampleList <- list(numericVector, m, messyMatrix)
exampleList
## [[1]]
## [1] 1 1 2 3
##
## [[2]]
## [,1] [,2]
## [1,] 1 2
## [2,] 1 3
##
## [[3]]
## [,1] [,2]
## [1,] "1" "2"
## [2,] "one" "3"
A list is a one dimensional object constructed by combining ANY objects with ANY dimensionality.
List position is denoted by [[x]].
[[1]]
| [1] | [2] | [3] | [4] |
|---|---|---|---|
| 1 | 1 | 2 | 3 |
[[2]]
| [ ,1] | [ ,2] | |
|---|---|---|
| [1, ] | 1 | 2 |
| [2, ] | 1 | 3 |
[[3]]
| [ ,1] | [ ,2] | |
|---|---|---|
| [1, ] | “1” | “2” |
| [2, ] | “one” | “3” |
A list is a one dimensional object constructed by combining ANY objects with ANY dimensionality.
List position is denoted by [[x]].
# List indexing:
exampleList
## [[1]]
## [1] 1 1 2 3
##
## [[2]]
## [,1] [,2]
## [1,] 1 2
## [2,] 1 3
##
## [[3]]
## [,1] [,2]
## [1,] "1" "2"
## [2,] "one" "3"
exampleList[[2]]
## [,1] [,2]
## [1,] 1 2
## [2,] 1 3
A list is a one dimensional object constructed by combining ANY objects with ANY dimensionality.
List position is denoted by [[x]].
# List indexing:
exampleList[[2]]
exampleList[[2]] == m
m[2,2]
exampleList[[2]][2,2]
Typical attributes we are interested in of lists include:
# Attributes of a list:
class(exampleList)
length(exampleList)
str(exampleList)
Typical attributes we are interested in of lists include:
# Attributes of list items:
class(exampleList[[1]])
length(exampleList[[1]])
Attributes can be added to lists
# Adding attributes to a list:
exampleList
names(exampleList)
names(exampleList) <- c('numericVector', 'm', 'messyMatrix')
attributes(exampleList)
Lists can be indexed by their names attribute using matrix notation or the $ operator.
# Lists can be indexed by name using the notation:
exampleList[[3]]
exampleList[['messyMatrix']]
exampleList$messyMatrix
A data frame is a two dimensional object constructed by combining vectors.
Each value in a data frame has a row and column position, denoted by “[x, y]”
| [ ,1] | [ ,2] | |
|---|---|---|
| [1, ] | 1 | 1 |
| [2, ] | 2 | 3 |
A data frame is a two dimensional object constructed by combining vectors.
# Generate a data frame:
df <- data.frame(a = c(1, 1), b = c(2, 3))
df
## a b
## 1 1 2
## 2 1 3
The vectors that are contained in a data frame may be of different classes.
# Generate a data frame of different vector classes:
data.frame(a = c('one', 'one'), b = c(2, 3))
## a b
## 1 one 2
## 2 one 3
But vectors are still coerced into the same class!
# Attempt to generate a data frame with heterogeneous vectors:
messyDf <- data.frame(a = c(1, 'one'), b = c(2, 3))
messyDf
## a b
## 1 1 2
## 2 one 3
Values in a data frame have a row (x) and column (y) position, denoted by “[x, y]”
| [ ,1] | [ ,2] | |
|---|---|---|
| [1, ] | 1 | 1 |
| [2, ] | 2 | 3 |
# Index by row (x) and column (y) position [x,y]:
df[1,1]
df[2,2]
df[1:2,2]
There are a number of attributes that can be observed for a given data frame:
# View data frame attributes:
str(df)
class(df)
length(df)
dim(df)
summary(df)
Always check attributes prior to working with data frame!
# View attributes of the messy dataframe:
str(messyDf)
## 'data.frame': 2 obs. of 2 variables:
## $ a: Factor w/ 2 levels "1","one": 1 2
## $ b: num 2 3
dfStrings <- data.frame(
a = c(1, 'one'),
b = c(2, 3),
stringsAsFactors = FALSE
)
str(dfStrings)
## 'data.frame': 2 obs. of 2 variables:
## $ a: chr "1" "one"
## $ b: num 2 3
Name attributes are automatically set when a data frame is created. Failing to set this attribute leads to bad names:
# Set and unset names:
data.frame(a = c(1, 1), b = c(2, 3))
## a b
## 1 1 2
## 2 1 3
data.frame(c(1, 1),c(2, 3))
## c.1..1. c.2..3.
## 1 1 2
## 2 1 3
Similar to other objects, the names attribute can also be set manually after an object is created:
# View data frame attributes:
exampleDf <- data.frame(c(1, 1),c(2, 3))
names(exampleDf) <- c('hello', 'world')
exampleDf
## hello world
## 1 1 2
## 2 1 3
Data frames can be indexed by their names attribute using matrix notation or the $ operator.
# View data frame attributes:
exampleDf['hello']
exampleDf$hello
A tibble is a special type of data frame provided by the package tidyverse.
# Read tidyverse package(s):
library(tidyverse)
# Generate a tibble data frame:
tibbleDf <- data_frame(a = c(1, 'one'), b = c(2, 3))
tibbleDf
## # A tibble: 2 x 2
## a b
## <chr> <dbl>
## 1 1 2
## 2 one 3
Base R data frames can also be converted to a tibble.
# Convert a data frame to a tbl:
tbl_df(messyDf)
## # A tibble: 2 x 2
## a b
## <fctr> <dbl>
## 1 1 2
## 2 one 3
tbl_df(data.frame(a = c(1, 'one'), b = c(2, 3)))
## # A tibble: 2 x 2
## a b
## <fctr> <dbl>
## 1 1 2
## 2 one 3
How do tibbles differ from Base R data frames?
# Compare tibble and base R data frame:
data.frame(a = c(1, 'one'), b = c(2, 3))
## a b
## 1 1 2
## 2 one 3
data_frame(a = c(1, 'one'), b = c(2, 3))
## # A tibble: 2 x 2
## a b
## <chr> <dbl>
## 1 1 2
## 2 one 3
How do tibbles differ from Base R data frames?
# Load data from:
data(mtcars)
mtcars
tbl_df(mtcars)
By next Wednesday, please complete this worksheet.